Selection of tuning parameters in bridge 
regression models via Bayesian information 

criterion 

Shuichi Kawano 

^N ' Department of Mathematical Sciences, Graduate School of Engineering, 

^^ ' Osaka Prefecture University, 1-1 Gakuen-cho, Sakai, Osaka 599-8531, Japan. 

(N' 



Oh: 

< 



m 



skawano@nis.osakafu-u.ac.jp 

Abstract: We consider the bridge linear regression modeling, which can produce 

a sparse or non-sparse model. A crucial point in the model building process is the 

selection of adjusted parameters including a regularization parameter and a tuning 

parameter in bridge regression models. The choice of the adjusted parameters can be 
C^ . 

c/3 I viewed as a model selection and evaluation problem. We propose a model selection 

-Y-\ ■ criterion for evaluating bridge regression models in terms of Bayesian approach. 

\Q . This selection criterion enables us to select the adjusted parameters objectively. 
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en We investigate the effectiveness of our proposed modeling strategy through some 
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numerical examples. 
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1 Introduction 

With the appearance of high-throughput data of unexampled size and complexity, sta- 
tistical methods have increasingly become important. In particular, the linear regression 
models are widely used and fundamental tools in statistics. The parameters in the regres- 
sion models are usually estimated by the ordinary least squares (OLS) or the maximum 
likelihood method. However, the models estimated by these methods often cause unstable 
estimators of the parameters and yield large prediction errors, when there exists in the 
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multicoUinearity in the regression models. 

In order to overcome the problem, various penalized regression methods, e.g., the ridge 
regression (Hoerl and Kennard, 1970), the lasso (Tibshirani, 1996), the bridge regression 
(Frank and Friedman, 1993), the elastic net (Zou and Hastie, 2005), the SCAD (Fan 
and Li, 2001) and the MCP (Zhang, 2010), have been proposed. Among the penalized 
methods, we focus on the bridge linear regression method, which is the linear regression 
models estimated by the penalized method with the bridge penalty. An advantage of the 
bridge regression is to be able to produce a sparse model, which has received considerable 
attention in the high-dimensional data analysis that has exhaustibly studied in the late 
machine learning and statistical literature (see, e.g., Biihlmann and van de Geer, 2011), 
or a non-sparse model by controlling a tuning parameter included in the penalty function. 
Also, many researches (e.g., Armagan, 2009; Fu, 1998; Huang et ai, 2008; Knight and 
Fu, 2000) have showed that the bridge regression models are helpful from the practical 
and theoretical perspectives. Although the bridge regression is useful as seen above, 
there remains a problem of evaluating the bridge regression models, which leads to the 
selection of adjusted parameters involved in the constructed bridge regression models. 
For evaluating the models, the cross-validation (CV) is often utilized. The computational 
time of the CV, however, tends to be very large, and the high variability and tendency to 
undersmooth in CV are not negligible, since the selectors are repeatedly applied. 

In this paper, we present a model selection criterion for evaluating the models esti- 
mated by the penalized maximum likelihood method with the bridge penalty from the 
viewpoint of Bayesian approach. The proposed criterion enables us to select appropriate 
values of the adjusted parameters in the bridge regression models objectively. Through 
some numerical studies, we investigate the performance of our proposed procedure. 

This paper is organized as follows. Section 2 describes the bridge linear regression 
models with estimation algorithm. In Section 3, we introduce a model selection crite- 
rion derived from Bayesian viewpoint to choose some adjusted parameters in the models. 
Section 4 conducts Monte Carlo simulations and a real data analysis to examine the per- 
formance of our proposed strategy and to compare several types of criteria and methods. 



Some concluding remarks are given in Section 5. 

2 Bridge regression modeling 

2.1 Preliminary 

Suppose that we have a data set {{yi,Xi);i = l,...,n}, where i/j e M is a response 
variable and Xi = {xa, . . . ,Xip)'^ denotes a p-dimensional covariate vector. Without loss 
of generality, it is assumed that the response is centered and the covariate is standardized, 
that is, 

n n n 

^yi = o, ^Xij = o, ^^1 = ^, j = i,...,p. (1) 

j=i j=i j=i 

In order to capture a relationship between the response i/i and the covariate vector Xj, we 
consider the linear regression model 

y = Xf3 + e, (2) 

where y = (yi, . . . , |/„)"^ is an n-dimensional response vector, X is an n x p design matrix, 
f3 = {(3i, . . . , /3p)'^ is a p-dimensional coefficient vector and e = {ei, . . . ,en)'^ is an n- 
dimensional error vector. In addition, we assume that the error Si {i = l,...,n) is 
independently distributed as the normal distribution with mean zero and variance a^. 

From above some assumptions, we have a probability density function for the response 
y in the following: 



fiVilxi^e) = — ==exp 



2a2 



^^-^^' ,^l,...,n, (3) 



where 6 = {f3 , a ) is a parameter vector to be estimated. This leads to a log-likelihood 
function given by 

n -. n 

m = Y,^ogf{y^x,-e) = -^\og{2na') - ^Y.^y^ - x^f- (4) 

2=1 i=l 



2.2 Estimation via the bridge penalty 

The unknown parameter is estimated by the penahzed maximum hkehhood method, 
that is, maximizing a penahzed log-hkehhood function 

i=i 
where Px{-) is a penalty function and A (> 0) is a regularization parameter. Until now, 
many penalty functions have been proposed: e.g., the L2 penalty or the ridge penalty 
(Hoerl and Kennard, 1970) given by px{(3) = A/3^/2, the Li penalty or the lasso penalty 
(Tibshirani, 1996) given by p\{f3) = \\(3\, the elastic net penalty (Zou and Hastie, 2005) 
given by px{f3) = A{a/3^/2 + (1 — a)|/3|}, where a (0 < a < 1) is a tuning parameter, the 
SCAD penalty (Fan and Li, 2001) given by p'^{/3) = A[J(|/3| < A) + {aX - |/3|)+/(|/3| > 
A)/{(a — 1)A}], where a (> 2) is a tuning parameter and (x)+ = max(0, x), and the MCP 
(Zhang, 2010) given by p'xif^) = (oA — |/3|)+/a, where a (> 0) is a tuning parameter. 
Note that the ridge penalty, the lasso penalty and the elastic net penalty are convex 
functions, while the SCAD penalty and the MCP are non-convex, and the lasso penalty, 
the elastic net penalty, the SCAD penalty and the MCP can produce sparse solutions for 
coefficient parameters, while the ridge penalty cannot. For more penalty functions, we 
refer to Antoniadis et al. (2011) and Lv and Fan (2009). 

In this paper, since we consider the bridge regression models, the formulation of the 
penalty Px{-) is A|/3|^/2, called the bridge penalty (Frank and Friedman, 1993), and then 
we obtain 

hAo) = m-'^Y.m\ (6) 

where q (> 0) is a tuning parameter. It is clear that the bridge penalty is the Li penalty 
when g = 1 and is the L2 penalty when q = 2. Also, it is known that the bridge penalty 
yields sparse models if < g < 1, while the penalty yields non-sparse models if g > 1. 

There are many researches about the bridge regression. Armagan (2009), Fu (1998) 
and Zou and Li (2008) proposed efficient algorithms for solving bridge regression models. 
Huang et al. (2008) and Knight and Fu (2000) showed asymptotic properties for linear 



regression models with the bridge penalty. Huang et al. (2009) and Park and Yoon (2011) 
extended the bridge penalty into the group bridge penalty, which is an extension of the 
group lasso penalty presented by Yuan and Lin (2006). 

Since the bridge penalty is a convex function when q > I, the Equation ([6]) is a concave 
optimization problem. Hence, in order to obtain estimators of coefficient parameters, we 
can use usual optimization algorithms; e.g., the shooting algorithm (Fu, 1998). However, 
since the bridge penalty is non-convex when < g < 1, the Equation ([2]) is a non-concave 
optimization problem. Thus, we need to approximate the bridge penalty into a convex 
function. We apply the local quadratic approximation (LQA) introduced by Fan and Li 
(2001) for the bridge penalty. 

For the LQA, under some conditions, the penalty function can be approximated at 
initial values (3^^'^ = (/3| , . . . , /3p )^ in the form 

\Pj I 
Then, the Equation (|6]) can be expressed as 

iUG) = m-'^J2\pfr'^]- (8) 

This formulation is clearly a concave optimization problem since the bridge penalty is re- 
placed with the quadratic function with respect to coefficient parameters I3j [j = 1, . . . ,p), 
and hence it is easy to obtain estimators of parameters 6. The estimators of parameters 
6 can be derived according to the following algorithm: 

Stepl Set the values of the regularization parameter A and the tuning parameter q, 
respectively. 

Step2 Initialize /3'^°^ = (/3| . . . , /3p )-^ and cr*^°''^. In our numerical studies, we set 

/3(°) = {X^X + n^I,)-'X^y, a^^ = i, (9) 

where 7 = 10"^ and Ip is a. p x p identity matrix. 
Step3 Update the coefficient vector /3 as follows: 

^(.+1) ^ i^T^ ^ Ex,,0^'^)r'X^y, k = 0,1,2,..., (10) 



where ^x,g0^'''^) = diag(nAa('=)2g|/3f V~V4, . . . ,nA<T('=)2g|/3j''V"V4). 
Step4 Update the parameter cr^ in the form 

^(fc+i)2 ^ i(^ _ xp^'^+'Yiy - x/3('+'^ (11) 

n 
Step5 Repeat the Step3 into the Step4 until the following condition 

|^(fc+i)_^(fc)| <5 (12) 

is satisfied, where 6 is an arbitrary small number (e.g., 10^^ in our numerical exam- 
ples). 

From the procedures, we obtain the estimator = (/3-^, o"^)^, and then it follows that 



we derive a statistical model 

f\yi\x,-e) = — ===exp 



, i = l,...,n. (13) 



2^2 

The statistical model includes some adjusted parameters, i.e., the regularization param- 
eter A and the tuning parameter q. In order to choose these parameters objectively, we 
introduce a model selection criterion in terms of Bayesian approach. 

3 Model selection criteria 

3.1 Proposed criterion 

Schwarz (1978) proposed the Bayesian information criterion (BIC) from the aspect of 
Bayesian theory. The BIC, however, is not applicable to models estimated by other 
methods except for the maximum likelihood method. Konishi et al. (2004) extended the 
BIC such that it could be used for evaluating statistical models estimated by the penalized 
maximum likelihood method. 

The Bayesian approach is to select the values of regularization parameter A and tuning 
parameter q that maximizes the marginal likelihood. The marginal likelihood is calculated 
by integrating over the unknown parameter 6 and is defined by 

/n I. n 

W f{y,\x,- e)7r{e)dO =11 f{y^\x^■, e)7r{(3\a^)7r{a^)dO, (14) 



where 7r{0) = 7r(/3|o"^)7r(cr^) is the prior distribution of the parameter 6. In bridge re- 
gression models, the prior distribution 7r((T^) is assumed to be a non- informative prior 
distribution and the prior distribution 7r(/3|cr^) = 7r(/3) can be found in Fu (1998) as 
follows: 

7r(/3|A,g) = nvr(/3,|A,g) = n ^" ^^^^^^^^ exp |-^|/j,|^| , (15) 

where r(-) is the Gamma function. 

In general, it is difficult to evaluate the Equation (!T4|) . since we must often calculate 
a high-dimensional integral. Hence, some approximation methods are usually applied 
for the integral, for example, the Laplace approximation (Tierney and Kadane, 1986). 
However, in situations where some components of f3 are exactly zero with bridge ap- 
proaches, the functional in the integral (1141) is not differentiable at the origin, and then 
the approximation methods cannot be directly applied. 

Let A = {j'lPj 7^ 0} be active set of the parameter (3. In order to overcome the 
problem, we consider the partial marginal likelihood given by 

/n 
\[f{y.,\x,-e)ix{f3\X,q)deA, (16) 

where 0^ = (/Jj, cr'^Y . Here /3^ = (/3fc^, . . . , Pk^Y , where we set A = {ki, . . . , kr} and 
ki < ■ ■ ■ < kr- The quantity is calculated by integrating over the unknown parameter 0_4 
included with the active set A. Applying the Laplace approximation for the Equation 
flT6!) . we obtain 

llfimlx,; e)ni(3\X,q)de^ = ; / . exp{nv{0^)} {l + 0,{n-')} , (17) 



where 



nl^l+i|V(0^)|V2 



^^) = ^ ^°S i n /(^^l^- ^)^(/3|A, q) I , V^(^) 






and 0_4 = (/35, a^)"^, where /3^ is the estimator of the coefficient /3_4. 

By taking the logarithm of the formula calculated by the Laplace approximation, 
Konishi et al. (2004) presented the generalized Bayesian information criteria (GBIC) to 



evaluate models estimated by the penalized maximum likelihood method. Uisng the result 
of Konishi et al. (2004, p. 30), we derive a model selection criterion 

GBIC = nlog(27r) + nloga'^ + n - {\A\ + 1) log ( — ) + log | J| 



2\A\ logg + 2\A\ ( 1 + ^V^gS - ^log(nA) + 2\A\ logT (- 



where J is a (|^| + 1) x (|^| + 1) matrix given by 



J 



na^ 



( 



\ 



X^Xa + n\cr\iq - 1)K t^X^M, 



a^ 



-1„AX^ 



n 
25^ 



/ 



K = diag(|4j'?-V2, . . . , |4j'?-V2) and 



Xa = Ixif,], i = 1,. . . ,n: k e A 









;i8) 



(19) 



Here 1^^ = (1, . . . , l)'^ is an n-dimensional vector, A = diag(?/i — xj^^^ . . . ^y^ — a^^/3), 



,...,.(., 



(20) 



We choose adjusted parameters, including values of regularization parameter A and 
tuning parameter q, from the minimizer of the GBIC in Equation ( TTSl) . 

3.2 Other criteria 

This section describes other selection criteria for choosing adjusting parameters included 
in bridge regression models. 



3.2.1 Modified AIC and modified BIC 

As an approximation of the effective degrees of freedom in the model selection theory, 
Hastie and Tibshirani (1990) proposed to use the trace of the hat matrix. In bridge 
regression models, the hat matrix is given by S* = X^(X^X_4 + T,x,q{f3j^))~^Xj^, where 
SA,g(/3^) = diag(r2A(3"^g|/3fcJ'?~^/4, . . . ,nXa^q\f3k^\'^~^/4:). By replacing the number of pa- 
rameters in AIC (Akaike, 1974) and BIC (Schwarz, 1978) with the trace of the hat matrix 



5", we obtain the modified AIC and modified BIG, respectively, 

n 

mAIC = -2^\ogf{yi\xi-e) + 2iiS, (21) 

n 

mBIC = -2'^\ogf{yi\xi-e) + (tr5)logn. (22) 

A problem may arise in theoretical justification for the use of the bias-correction term, 
since the AIC and the BIG only cover statistical models estimated by the maximum 
likelihood method. 

3.2.2 Bias corrected AIC 

Hurvich and Tsai (1989) and Sugiura (1978) proposed an improved version of AIC in the 
context of linear regression models and autoregressive time series models estimated by the 
maximum likelihood method. Hurvich et al. (1998) presented to replace the number of 
parameters in the improved version of AIC with the trace of the hat matrix and introduced 
the criterion 

AlCc = -2J^\ogf{m\x,-d) + ^"l^'f/,^^ - (23) 

3.2.3 Cross-validation and generalized cross-validation 

The cross-validation evaluates a statistical model for each observation by using the re- 
maining data . Let y^~^^ be a regression response value estimated by the observed data 
except {yi,Xi). The cross-validation criterion is then 

2 






(24) 



where jji, . . . ,yn are fitted values and Sa is an i-th diagnal element of the hat matrix S. 

Craven and Wahba (1979) proposed the generalized cross-validation by replacing the 
value Sii in the Equation flMl) with the trace of the hat matrix as follows: 



Gcv=^$:(^^y^) . (25) 



3.2.4 Extended information criterion 

Let {{yl ,xl )]i = 1, . . . ,n} (b = 1, . . . , B) he the 6-th bootstrap sample by resamphng, 
and 0^^' be the bridge estimator based on the 6-th bootstrap sample. The extended 
information criterion proposed by Ishiguro et al. (1997) is then defined by 

mC = -2Y,\og f {y,\x.c. 6) + - Y, {^og f{yf\xf^-e^'^)- log f{y,\x,- 6^'^)]. (26) 

i=l 6=1 

In our numerical experiments, B is set to 100. 

4 Numerical results 

In order to show the efficiency of our proposed modeling strategy, we conducted some 
numerical examples. Monte Carlo simulations and analysis of real data are given to 
illustrate the proposed bridge modeling procedure. 

4.1 Simulated examples 

We performed a simulation study to validate our proposed modeling procedure. The 
simulation has five settings, and the design matrix X was generated from a multivariate 
normal distribution with mean zero and variance 1 for Settings 1, 2, 3 and 4, and then 
the correlation structure was given at each setting. The response vector y is generated 
from the true regression model 

y = Xf3 + e, 6~iV(0,a24), (27) 

where /„ is an n x n identity matrix. Our five simulation settings are given as follows: 

• Setting 1 : The training data and the test data consisted of 20 observations and 200 
observations, respectively. The true parameter was (3 = (3, 15, 7.5, 5, 2, 0, 0, 0, 0, 0)"^ 
and cr = 3. The pairwise correlation between Xi and Xj was cor(a:;j, a:;j) = 0.5'*"-''. 
This setting is the sparse case. 

• Setting 2 : This setting is the same as the Setting 1 except for (3j = 10 (j = 
1, . . . , 10). That is, the Setting 2 is the dense case. 
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Setting 3 : This setting is also the same as the Setting 1. However, the true 
parameter was f3 = (5, 0, 0, 0, 0, 0, 0, 0)"'" and cr = 2. In this model, we consider the 
sparse case. 

Setting 4 : 100 observations and 400 observations were generated for the training 
data and the test data, respectively. We set 

/3 = (0^_^, 5^_^, 0^_^, 3^_^^ (28) 

10 10 10 10 

and (7 = 3. The pairwise correlation between Xi and Xj was cor(a:;j, Xj) = 0.95'*"-''. 
This model also consider the sparse pattern. 

Setting 5 : The generating procedure of the training data and test data is the same 
as the Setting 4. The true parameter was 

/3 = (10^_^, 0^_^^ (29) 

35 5 

and 0" = 3. The design matrix X was generated as follows: 



X 



«j 



Zk + Ej, Zk ~ iV(0, 1), j = 5A; - 4, . . . , 5A;, fc = 1, . . . , 7 for all i, (30) 



Xij ~ iV(0, 1), J = 36, . . . , 40 for all i, (31) 

where Ej were identically distributed as A^(0, 0.01) for j = 1, . . . , 35. This model 
was also the sparse case. 

We fitted the bridge regression models to the simulated data. The regularization 
parameter A and the tuning parameter q in the bridge penalty were selected by GBIC, 
mAIC, mBIC, AICc, CV, GOV and ETC, where we set the candidate values of A and q to 
|10-o-«+3; z = 1, . . . , 100} and {0.1, 0.4, 0.7, 1, 1.3, 1.7, 2, 2.3, 2.7}, respectively. 

We computed the mean squared error (MSE) defined by MSE = J2^=iiyi ^ ViY 1''^^ 
where yl, . . . ,y^ denote test data for the response variable generated from the true model. 
Also, the means and standard deviations of the adjusted parameters A and q were com- 
puted. The simulation results were obtained by averaging over 100 Monte Carlo trials. 
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Table 1: Comparisons of the mean squared error (MSE) based on various criteria for the 
Setting 1. Figures in parentheses give estimated standard deviations. 





GBIC 


niAIC 


mBIC 


AICc 


CV 


GCV 


EIC 


MSE 


15.67 


17.13 


16.18 


15.80 


16.27 


16.02 


17.89 




(6.12) 


(6.83) 


(6.15) 


(5.99) 


(6.08) 


(6.00) 


(11.39) 


logio(^) 


-1.137 


-0.803 


-0.544 


-0.412 


-0.599 


-0.593 


-0.625 




(0.4138) 


(0.3343) 


(0.2198) 


(0.1849) 


(0.3875) 


(0.2425) 


(0.7058) 


q. 


0.598 


0.946 


0.805 


0.745 


0.832 


0.841 


0.890 




(0.2670) 


(0.3229) 


(0.2500) 


(0.1777) 


(0.3675) 


(0.2778) 


(0.2455) 



Table 2: Comparisons of the mean squared error (MSE) based on various criteria for the 
Setting 2. Figures in parentheses give estimated standard deviations. 

GBIC mAIC mBIC AICc CV GCV EIC 

MSE 20.72 21.23 21.42 24.02 21.82 21.63 81.06 

(8.03) (8.43) (8.67) (11.95) (10.16) (8.90) (240.06) 

logio(A) -4.148 -1.106 -1.007 -0.932 -0.983 -0.963 -2.518 

(0.084) (0.975) (0.972) (0.894) (1.557) (0.942) (0.369) 

q 2.700 1.140 1.176 1.372 1.289 1.201 2.560 

(0.000) (0.639) (0.638) (0.652) (1.013) (0.639) (0.243) 
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Table 3: Comparisons of the mean squared error (MSE) based on various criteria for the 
Setting 3. Figures in parentheses give estimated standard deviations. 





GBIC 


niAIC 


mBIC 


AICc 


CV 


GCV 


EIC 


MSE 


4.986 


5.836 


5.213 


5.068 


5.470 


5.551 


5.500 




(1.158) 


(1.744) 


(1.456) 


(1.385) 


(1.488) 


(1.549) 


(1.753) 


logio(^) 


-0.741 


-0.550 


-0.158 


-0.352 


-0.498 


-0.477 


-0.618 




(0.3900) 


(0.3672) 


(0.4311) 


(0.2921) 


(0.7602) 


(0.2714) 


(0.9798) 


q. 


0.466 


0.844 


0.556 


0.565 


0.652 


0.754 


0.778 




(0.2507) 


(0.4699) 


(0.4522) 


(0.3447) 


(0.5003) 


(0.4377) 


(0.4760) 



Table 4: Comparisons of the mean squared error (MSE) based on various criteria for the 
Setting 4. Figures in parentheses give estimated standard deviations. 





GBIC 


mAIC 


mBIC 


AICc 


CV 


GCV 


EIC 


MSE 


11.76 


11.93 


12.21 


11.92 


11.87 


11.87 


11.54 




(1.199) 


(1.231) 


(1.375) 


(1.254) 


(1.247) 


(1.247) 


(0.994) 


logio(A) 


-2.094 


-0.788 


-0.548 


-0.664 


-0.710 


-0.699 


-1.834 




(0.2173) 


(0.2818) 


(0.0559) 


(0.0785) 


(0.1267) 


(0.1010) 


(0.3590) 


Q 


0.874 


1.030 


1.000 


1.003 


1.009 


1.006 


1.890 




(0.1715) 


(0.0904) 


(0.0000) 


(0.0300) 


(0.0514) 


(0.0422) 


(0.3729) 
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Table 5: Comparisons of the mean squared error (MSE) based on various criteria for the 
Setting 5. Figures in parentheses give estimated standard deviations. 





GBIC 


mAIC 


mBIC 


AICc 


CV 


GCV 


ETC 


MSE 


14.39 


14.62 


15.41 


14.91 


14.72 


14.73 


10.95 




(1.566) 


(1.861) 


(2.028) 


(1.890) 


(1.868) 


(1.872) 


(1.001) 


logio(^) 


-3.768 


-0.945 


-0.779 


-0.858 


-0.910 


-0.901 


-2.925 




(1.117) 


(0.107) 


(0.068) 


(0.066) 


(0.092) 


(0.088) 


(0.527) 


q 


1.827 


1.009 


1.000 


1.000 


1.006 


1.006 


2.34 




(0.8610) 


(0.0514) 


(0.0000) 


(0.0000) 


(0.0422) 


(0.0422) 


(0.3613) 



which are shown in Tables [H El [3l H] and El The values in parentheses indicate standard 
deviations for the means. 

The simulation results are summarized as follows. In the Settings 1, 2 and 3, all criteria 
provide an appropriate value of the tuning parameter q: i.e., the tuning parameter q is 
larger than 1 when the structure of the coefficient parameter /3 is dense and < g < 1 is 
given when the structure of the coefficient parameter (3 is sparse. For the Setting 4, the 
GBIC and the mBIC yield sparse solutions for coefficient vectors (3, while other criteria 
produce dense solutions. In the fifth setting, the mBIC and the AICc can select the 
appropriate value of the tuning parameter q, whereas other criteria including the GBIC, 
which is the proposed criterion in this paper, do not. However, the GBIC is superior to 
other criteria in almost all cases in the sense of minimizing the MSE and MSEs for the 
GBIC have smaller standard deviations among various criteria except for the EIC in the 
Settings 4 and 5. Note that the EIC appears to be unstable, since the criterion provides 
the worst MSEs in the Settings 1 and 2 while the smallest MSEs are certainly given in 
the Settings 4 and 5. In addition, the EIC requires much computational load, and hence 
our proposed criterion GBIC seems to be useful from the viewpoints of minimization of 
the MSEs and computational times. 

We also compared the bridge regression models with the GBIC to OLS, ridge, lasso 
and elastic net (ENet). An adjusted parameter included in ridge regression was selected 
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Figure 1: Boxplots of the MSEs. The left top panel shows the result for the Setting 1, 
the right top panel that for the Setting 2, the left bottom panel that for the Setting 3, 
the right bottom panel that for the Setting 4 and the center bottom panel that for the 
Setting 5. 
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Table 6: Prediction errors for pollution data set. 



Method 


bridge 


OLS 


ridge 


lasso 


ENet 


Prediction error 


1663.516 


1822.312 


1817.689 


1735.713 


1720.655 



by the leave-one-out cross-validation, and adjusted parameters involved in lasso and ENet 
were selected by the five-fold cross-validation. In order to evaluate the performance of 
each model, we computed the MSE, and described boxplots of the values for the 100 
trials of Monte Carlo simulations. Figure [1] shows the boxplots of the MSEs. In almost 
situations, our proposed bridge regression modeling may perform well; i.e., it produces a 
relatively small median with small variance. 

4.2 Analysis of real data 

We applied the bridge regression modeling evaluated by the GBIC to the pollution data 
set. This data were analyzed by McDonald and Schwing (1973), Liu et al. (2006) and 
Park and Yoon (2011). The data set consists of 60 observations and 15 covariates. The 
response variable is the total age-adjusted mortality rate obtained for the years 1059- 
1961 for 201 Standard Metropolitan Statistical Area. The data set is available from the 
SMPracticals package in the software R. 

In order to validate prediction errors, we randomly divide the data set into 40 training 
data and 20 test data. Using the training data set, we constructed the regression models 
with the bridge penalty. The values of the regularization parameter A and the tuning 
parameter q were chosen by using the GBIC. Here, we set the candidate values of A and q 
into {10~°-i*+^ z = 1, . . . , 100} and {0.1, 0.25, 0.4, 0.55, 0.7, 0.85, 1, 1.3, 1.7, 2}, respectively. 
The selected values of adjusted parameters were A = 0.007943 and q = 0.7. 

We compared the performance of our modeling procedure with that of OLS, ridge, 
lasso and Enet. Table [6] summarizes the prediction errors by these methods. We observe 
that the bridge regression model outperforms other methods. Table [7] is the selected 
variables using the entire data in the pollution data set. Lasso and ENet choose the same 
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Table 7: Selected variables for pollution data set. 



Method 



Selected variables 



McDonald and Schwing 

Luo et al. 

Park and Yoon (LQA) 

Park and Yoon (LLA) 

lasso 

ENet 

bridge with GBIC 



(1,2,6,8,9,14) 

(1,2,6,9,14) 

(1,2,3,6,8,9,14) 

(1,2,3,6,7,8,9,14,15) 

(1,2,6,8,9,14) 

(1,2,6,8,9,14) 

(1,8,9,14) 



variables with McDonald and Schwing (1973), and our proposed method has the smallest 
model among them. From these descriptions, we conclude that at least variables 1, 8, 9 
and 14 may be relevant with the response variable, since the variables are included in all 
methods. 

5 Concluding remarks 

In this paper, we have considerd the problem of evaluating linear regression models esti- 
mated by the penalized maximum likelihood method with the bridge penalty. In order to 
select the optimal values of the adjusted parameters including the regularization parame- 
ter in the penalized maximum likelihood function and the tuning parameter in the bridge 
penalty, we have proposed a model selection criterion in terms of Bayesian theory. Monte 
Carlo simulations and analyzing a real data have showed that our proposed modeling pro- 
cedure performs well in various situations from the viewpoint of yielding relatively lower 
prediction errors than previously developed criteria and methods. The future work is to 
apply the proposed procedure into high- dimensional data sets and extend our models in 
the framework of generalized linear models. 
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